AITopics | perceptual evaluation

Collaborating Authors

perceptual evaluation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Evaluating Emotion Recognition in Spoken Language Models on Emotionally Incongruent Speech

Corrêa, Pedro, Lima, João, Moreno, Victor, Ueda, Lucas, Costa, Paula Dornhofer Paro

arXiv.org Artificial IntelligenceOct-31-2025

ABSTRACT Advancements in spoken language processing have driven the development of spoken language models (SLMs), designed to achieve universal audio understanding by jointly learning text and audio representations for a wide range of tasks. Although promising results have been achieved, there is growing discussion regarding these models' generalization capabilities and the extent to which they truly integrate audio and text modalities in their internal representations. In this work, we evaluate four SLMs on the task of speech emotion recognition using a dataset of emotionally incongruent speech samples, a condition under which the semantic content of the spoken utterance conveys one emotion while speech expressiveness conveys another. Our results indicate that SLMs rely predominantly on textual semantics rather than speech emotion to perform the task, indicating that text-related representations largely dominate over acoustic representations. We release both the code and the Emotionally Incongruent Synthetic Speech dataset (EMIS) to the community.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.25054

Country:

South America > Brazil > São Paulo > Campinas (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.31)

Add feedback

bdb106a0560c4e46ccc488ef010af787-AuthorFeedback.pdf

Neural Information Processing SystemsAug-20-2025, 00:52:23 GMT

optimal objective, potential function, problem ii, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.53)

Add feedback

Perceptual Evaluation of GANs and Diffusion Models for Generating X-rays

Schuit, Gregory, Parra, Denis, Besa, Cecilia

arXiv.org Artificial IntelligenceAug-12-2025

Generative image models have achieved remarkable progress in both natural and medical imaging. In the medical context, these techniques offer a potential solution to data scarcity--especially for low-prevalence anomalies that impair the performance of AI-driven diagnostic and segmentation tools. However, questions remain regarding the fidelity and clinical utility of synthetic images, since poor generation quality can undermine model generalizability and trust. In this study, we evaluate the effectiveness of state-of-the-art generative models--Generative Adversarial Networks (GANs) and Diffusion Models (DMs)--for synthesizing chest X-rays conditioned on four abnormalities: Atelectasis (AT), Lung Opacity (LO), Pleural Effusion (PE), and Enlarged Cardiac Silhouette (ECS). Using a benchmark composed of real images from the MIMIC-CXR dataset and synthetic images from both GANs and DMs, we conducted a reader study with three radiologists of varied experience. Participants were asked to distinguish real from synthetic images and assess the consistency between visual features and the target abnormality. Our results show that while DMs generate more visually realistic images overall, GANs can report better accuracy for specific conditions, such as absence of ECS. We further identify visual cues radiologists use to detect synthetic images, offering insights into the perceptual gaps in current models. These findings underscore the complementary strengths of GANs and DMs and point to the need for further refinement to ensure generative models can reliably augment training datasets for AI diagnostic systems.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2508.07128

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Nuclear Medicine (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.49)

Add feedback

Reviews: HYPE: A Benchmark for Human eYe Perceptual Evaluation of Generative Models

Neural Information Processing SystemsJan-24-2025, 07:02:56 GMT

This paper introduces a framework to evaluate the perceptual realism of samples from generative models. The framework, HYPE- Human Eye Perceptual Evaluation, is based on psychophysics methods. Two different metrics are proposed. The first one, HYPE_time, measures the amount of time a human needs before distinguishing a real from a fake. The metric is clearly defined and very well founded on psychophysics.

generative model, metric, perceptual evaluation, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (0.68)
Information Technology > Artificial Intelligence > Vision (0.62)

Add feedback

Voice Passing : a Non-Binary Voice Gender Prediction System for evaluating Transgender voice transition

Doukhan, David, Devauchelle, Simon, Girard-Monneron, Lucile, Ruz, Mía Chávez, Chaddouk, V., Wagner, Isabelle, Rilliard, Albert

arXiv.org Artificial IntelligenceApr-23-2024

This paper presents a software allowing to describe voices using a continuous Voice Femininity Percentage (VFP). This system is intended for transgender speakers during their voice transition and for voice therapists supporting them in this process. A corpus of 41 French cis- and transgender speakers was recorded. A perceptual evaluation allowed 57 participants to estimate the VFP for each voice. Binary gender classification models were trained on external gender-balanced data and used on overlapping windows to obtain average gender prediction estimates, which were calibrated to predict VFP and obtained higher accuracy than $F_0$ or vocal track length-based models. Training data speaking style and DNN architecture were shown to impact VFP estimation. Accuracy of the models was affected by speakers' age. This highlights the importance of style, age, and the conception of gender as binary or not, to build adequate statistical representations of cultural concepts.

architecture, evaluation, gender, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.21437/Interspeech.2023-1835

2404.15176

Country:

Europe > France (0.05)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
North America > United States > Minnesota > Rice County > Northfield (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Perceptual Evaluation of a Music Source Separation CNN Trained With Binaural and Ambisonic Audio

#artificialintelligenceNov-7-2019, 18:40:55 GMT

This research explores the idea of using different spatial audio formats for training music source separation neural networks. DeepConvSep, a library designed by Marius Miron, Pritish Chandna, Gerard Erruz, and Hector Martel, is used as a framework for testing different convolutional neural networks for source separation. A listening test is then detailed and test results are analyzed in order to perform a perceptual evaluation of the models. Conclusions are drawn regarding the effectiveness of using spatial audio formats for training source separation neural networks. Neural networks for audio seek to enable an artificial intelligence to speak and hear akin to a human.

neural network, participant, separation, (13 more...)

#artificialintelligence

Country: Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)

Industry: Energy (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama's voice using GAN, WaveNet and low-quality found data

Lorenzo-Trueba, Jaime, Fang, Fuming, Wang, Xin, Echizen, Isao, Yamagishi, Junichi, Kinnunen, Tomi

arXiv.org Machine LearningMar-2-2018

Thanks to the growing availability of spoofing databases and rapid advances in using them, systems for detecting voice spoofing attacks are becoming more and more capable, and error rates close to zero are being reached for the ASVspoof2015 database. However, speech synthesis and voice conversion paradigms that are not considered in the ASVspoof2015 database are appearing. Such examples include direct waveform modelling and generative adversarial networks. We also need to investigate the feasibility of training spoofing systems using only low-quality found data. For that purpose, we developed a generative adversarial network-based speech enhancement system that improves the quality of speech data found in publicly available sources. Using the enhanced data, we trained state-of-the-art text-to-speech and voice conversion models and evaluated them in terms of perceptual speech quality and speaker similarity. The results show that the enhancement models significantly improved the SNR of low-quality degraded data found in publicly available sources and that they significantly improved the perceptual cleanliness of the source speech without significantly degrading the naturalness of the voice. However, the results also show limitations when generating speech with the low-quality found data.

artificial intelligence, machine learning, speech, (17 more...)

arXiv.org Machine Learning

1803.0086

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
South America > Colombia > Meta Department > Villavicencio (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > Finland > North Karelia > Joensuu (0.04)

Genre:

Research Report > New Finding (0.66)
Research Report > Experimental Study (0.46)

Industry: Information Technology > Security & Privacy (0.88)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback